28 research outputs found

    Local Access to Huge Random Objects Through Partial Sampling

    Get PDF
    © Amartya Shankha Biswas, Ronitt Rubinfeld, and Anak Yodpinyanee. Consider an algorithm performing a computation on a huge random object (for example a random graph or a “long” random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally “on-the-fly” (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdös-Rényi G(n, p) model for all values of p, and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid q-colorings of an input graph G with maximum degree ∆. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n, p) model). The distribution over valid colorings is instead specified via a “huge” input (the underlying graph G), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for q ≥ 9∆, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself

    Local Computation Algorithms for Spanners

    Get PDF
    A graph spanner is a fundamental graph structure that faithfully preserves the pairwise distances in the input graph up to a small multiplicative stretch. The common objective in the computation of spanners is to achieve the best-known existential size-stretch trade-off efficiently. Classical models and algorithmic analysis of graph spanners essentially assume that the algorithm can read the input graph, construct the desired spanner, and write the answer to the output tape. However, when considering massive graphs containing millions or even billions of nodes not only the input graph, but also the output spanner might be too large for a single processor to store. To tackle this challenge, we initiate the study of local computation algorithms (LCAs) for graph spanners in general graphs, where the algorithm should locally decide whether a given edge (u,v) in E belongs to the output (sparse) spanner or not. Such LCAs give the user the "illusion" that a specific sparse spanner for the graph is maintained, without ever fully computing it. We present several results for this setting, including: - For general n-vertex graphs and for parameter r in {2,3}, there exists an LCA for (2r-1)-spanners with O~(n^{1+1/r}) edges and sublinear probe complexity of O~(n^{1-1/2r}). These size/stretch trade-offs are best possible (up to polylogarithmic factors). - For every k >= 1 and n-vertex graph with maximum degree Delta, there exists an LCA for O(k^2) spanners with O~(n^{1+1/k}) edges, probe complexity of O~(Delta^4 n^{2/3}), and random seed of size polylog(n). This improves upon, and extends the work of [Lenzen-Levi, ICALP\u2718]. We also complement these constructions by providing a polynomial lower bound on the probe complexity of LCAs for graph spanners that holds even for the simpler task of computing a sparse connected subgraph with o(m) edges. To the best of our knowledge, our results on 3 and 5-spanners are the first LCAs with sublinear (in Delta) probe-complexity for Delta = n^{Omega(1)}

    Set Cover in Sub-linear Time

    Full text link
    We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of mm sets over nn elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear query model, that returns an α\alpha-approximate cover using O~(m(n/k)1/(α1)+nk)\tilde{O}(m(n/k)^{1/(\alpha-1)} + nk) queries to the input, where kk denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of kk, the required number of queries is Ω~(m(n/k)1/(2α))\tilde{\Omega}(m(n/k)^{1/(2\alpha)}), even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require Ω(nk)\Omega(nk) queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter kk. On the other hand, we show that this bound is not optimal for larger values of the parameter kk, as there exists a (1+ε)(1+\varepsilon)-approximation algorithm with O~(mn/kε2)\tilde{O}(mn/k\varepsilon^2) queries. We show that this bound is essentially tight for sufficiently small constant ε\varepsilon, by establishing a lower bound of Ω~(mn/k)\tilde{\Omega}(mn/k) query complexity

    Fractional Set Cover in the Streaming Model

    Get PDF
    We study the Fractional Set Cover problem in the streaming model. That is, we consider the relaxation of the set cover problem over a universe of n elements and a collection of m sets, where each set can be picked fractionally, with a value in [0,1]. We present a randomized (1+a)-approximation algorithm that makes p passes over the data, and uses O(polylog(m,n,1/a) (mn^(O(1/(pa)))+n)) memory space. The algorithm works in both the set arrival and the edge arrival models. To the best of our knowledge, this is the first streaming result for the fractional set cover problem. We obtain our results by employing the multiplicative weights update framework in the streaming settings

    An extreme case of plant-insect co-diversification: figs and fig-pollinating wasps

    Get PDF
    It is thought that speciation in phytophagous insects is often due to colonization of novel host plants, because radiations of plant and insect lineages are typically asynchronous. Recent phylogenetic comparisons have supported this model of diversification for both insect herbivores and specialized pollinators. An exceptional case where contemporaneous plant insect diversification might be expected is the obligate mutualism between fig trees (Ficus species, Moraceae) and their pollinating wasps (Agaonidae, Hymenoptera). The ubiquity and ecological significance of this mutualism in tropical and subtropical ecosystems has long intrigued biologists, but the systematic challenge posed by >750 interacting species pairs has hindered progress toward understanding its evolutionary history. In particular, taxon sampling and analytical tools have been insufficient for large-scale co-phylogenetic analyses. Here, we sampled nearly 200 interacting pairs of fig and wasp species from across the globe. Two supermatrices were assembled: on average, wasps had sequences from 77% of six genes (5.6kb), figs had sequences from 60% of five genes (5.5 kb), and overall 850 new DNA sequences were generated for this study. We also developed a new analytical tool, Jane 2, for event-based phylogenetic reconciliation analysis of very large data sets. Separate Bayesian phylogenetic analyses for figs and fig wasps under relaxed molecular clock assumptions indicate Cretaceous diversification of crown groups and contemporaneous divergence for nearly half of all fig and pollinator lineages. Event-based co-phylogenetic analyses further support the co-diversification hypothesis. Biogeographic analyses indicate that the presentday distribution of fig and pollinator lineages is consistent with an Eurasian origin and subsequent dispersal, rather than with Gondwanan vicariance. Overall, our findings indicate that the fig-pollinator mutualism represents an extreme case among plant-insect interactions of coordinated dispersal and long-term co-diversification

    Sub-linear algorithms for graph problems

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 189-199).In the face of massive data sets, classical algorithmic models, where the algorithm reads the entire input, performs a full computation, then reports the entire output, are rendered infeasible. To handle these data sets, alternative algorithmic models are suggested to solve problems under the restricted, namely sub-linear, resources such as time, memory or randomness. This thesis aims at addressing these limitations on graph problems and combinatorial optimization problems through a number of different models. First, we consider the graph spanner problem in the local computation algorithm (LCA) model. A graph spanner is a subgraph of the input graph that preserves all pairwise distances up to a small multiplicative stretch. Given a query edge from the input graph, the LCA explores a sub-linear portion of the input graph, then decides whether to include this edge in its spanner or not - the answers to all edge queries constitute the output of the LCA. We provide the first LCA constructions for 3 and 5-spanners of general graphs with almost optimal trade-offs between spanner sizes and stretches, and for fixed-stretch spanners of low maximum-degree graphs. Next, we study the set cover problem in the oracle access model. The algorithm accesses a sub-linear portion of the input set system by probing for elements in a set, and for sets containing an element, then computes an approximate minimum set cover: a collection of an approximately-minimum number of sets whose union includes all elements. We provide probe-efficient algorithms for set cover, and complement our results with almost tight lower bound constructions. We further extend our study to the LP-relaxation variants and to the streaming setting, obtaining the first streaming results for the fractional set cover problem. Lastly, we design local-access generators for a collection of fundamental random graph models. We demonstrate how to generate graphs according to the desired probability distribution in an on-the-fly fashion. Our algorithms receive probes about arbitrary parts of the input graph, then construct just enough of the graph to answer these probes, using only polylogarithmic time, additional space and random bits per probe. We also provide the first implementation of random neighbor probes, which is a basic algorithmic building block with applications in various huge graph models.by Anak Yodpinyanee.Ph. D

    LCAs for graphs of non-constant degrees

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.Cataloged from PDF version of thesis.Includes bibliographical references (pages 41-44).In the model of local computation algorithms (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on n, the number of vertices. Nonetheless, these complexities are generally at least exponential in d, the upper bound on the degree of the input graph. Instead, we consider the case where parameter d can be moderately dependent on n, and aim for complexities with quasi-polynomial dependence on d, while maintaining polylogarithmic dependence on n. In this thesis, we give randomized LCAs for computing maximal independent sets, maximal matchings, and approximate maximum matchings. Both time and space complexities of our LCAs on these problems are 2 0(log3 d)polylog(n), 2 0(log2 d)polylog(n) and 2 0(log3 d)polylog(n), respectively.by Anak Yodpinyanee.S.M

    Local Computation Algorithms for Graphs of Non-constant Degrees

    No full text
    © 2016, Springer Science+Business Media New York. In the model of local computation algorithms (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on n, the number of vertices. Nonetheless, these complexities are generally at least exponential in d, the upper bound on the degree of the input graph. Instead, we consider the case where parameter d can be moderately dependent on n, and aim for complexities with subexponential dependence on d, while maintaining polylogarithmic dependence on n. We present:a randomized LCA for computing maximal independent sets whose time and space complexities are quasi-polynomial in d and polylogarithmic in n;for constant ε> 0 , a randomized LCA that provides a (1 - ε) -approximation to maximum matching with high probability, whose time and space complexities are polynomial in d and polylogarithmic in n
    corecore